- summarize all insights and ideas from the other notebooks, as well as good exploratory plots
name: makeovermonday_2021w22 link: https://data.world/makeovermonday/2021w22 title: 2021/W22: The Plastic Waste Makers Index Data Source: Minderoo
- Production of single-use plastic (SUP) and contribution to single-use plastic waste is estimated and calculated in million metric tons in 2019.
- Rigid packaging is packaging that features heavier and often stronger materials than flexible packaging. Forms of rigid packaging materials include but are not limited to: glass, hard plastics, cardboard, metal, and so on. Rigid packaging supplies are usually more expensive than their flexible alternatives and most have significantly higher carbon footprints than flexible packaging. see https://www.industrialpackaging.com/blog/flexible-vs-rigid-packaging
- Flexible packaging includes all malleable packaging. Some common examples of flexible packaging include shrink film, stretch film, flexible pouches, seal bands, blister or skin packs, and clamshells. In reality, flexible packaging includes any protective packaging made from materials including plastic, paperboard, paper, foil, wax-coated paperboard, and similar materials, or combinations of these materials. see https://www.industrialpackaging.com/blog/flexible-vs-rigid-packaging
- In-scope polymersSingle-use plastics can, in theory, be produced from over a dozen polymer families. However, in 2019, we estimate that close to 90 per cent of all single-use plastics by mass were produced from just five polymers: polypropylene (PP), high-density polyethylene (HDPE), low-density polyethylene (LDPE), linear low-density polyethylene (LLDPE), and polyethylene terephthalate resin (PET) (Figure M2). see https://cdn.minderoo.org/content/uploads/2021/05/18065501/20210518-Plastic-Waste-Makers-Index.pdf
head(plastic)
summary(plastic)
rank polymer_producer no_of_assets production_of_in_scope_polymers flexible_format_contribution_to_sup_waste rigid_format_contribution_to_sup_waste
Min. : 1.00 Length:100 Min. : 0.00 Min. : 0.200 Min. :0.000 Min. :0.000
1st Qu.: 25.75 Class :character 1st Qu.: 3.00 1st Qu.: 0.500 1st Qu.:0.100 1st Qu.:0.100
Median : 50.50 Mode :character Median : 6.00 Median : 0.900 Median :0.200 Median :0.200
Mean : 50.50 Mean :11.56 Mean : 1.805 Mean :0.538 Mean :0.416
3rd Qu.: 75.25 3rd Qu.:12.25 3rd Qu.: 1.700 3rd Qu.:0.500 3rd Qu.:0.500
Max. :100.00 Max. :82.00 Max. :11.600 Max. :4.700 Max. :4.500
total_contribution_to_sup_waste
Min. :0.200
1st Qu.:0.300
Median :0.450
Mean :0.950
3rd Qu.:0.925
Max. :5.900
| observations from clean nb |
- columns: rank numeric, ordered, unique, can serve as identifier, rank of producer according to index polymer_producer string, unique identifier, name of producer no_of_assets numeric, metric, number of assets of the producer production_of_in_scope_polymers numeric, metric in million metric tons, production of plolymers that are in-scope of preceding analysis flexible_format_contribution_to_sup_waste numeric, metric in million metric tons, flexible form of contribution to sup waste rigid_format_contribution_to_sup_waste numeric, metric in million metric tons, rigid form of contribution to sup waste total_contribution_to_sup_waste numeric, metric in million metric tons, total contribution is the sum of flexible and rigid
- no missing values at all, also it is a very small dataset
- no duplicated rows
- no changes were made to data set
| insights from describe uni |
- no_of_assets is poisson distributed, where most producer only have up to 9 (median = 6) assets, some have up to 29 (upper fence = 26), and only a few (outliers) are above that with up to 82 assets
- production_of_in_scope_polymers is poisson distributed, likes very similar to no_of_assets, median is 0.9, upper fence is 3.4, max is 11.6 -> might correlate with no_of_assets?
- flexible_format_contribution_to_sup_waste is poisson distributed, likes very similar to no_of_assets, median is 0.2, upper fence is 1.1, max is 4.7
- rigid_format_contribution_to_sup_waste is poisson distributed, likes very similar to no_of_assets, median is 0.2, upper fence is 1.1, max is 4.5, very similar to flexible_format_contribution_to_sup_waste, but with less outliers
- rigid_format_contribution_to_sup_waste is poisson distributed, likes very similar to no_of_assets (again), median is 0.45, upper fence is 1.9, max is 5.9 is sum of flexible_form + rigid_form
- ration of sup_waste to produced polymers is between min 0.3 and max 1.0 and has median 0.5, most data lies between 0.4 and 0.6, but there is a high spike at 1.0 (with count 15)
- comparing rigid_format and flexible_format shows that up to the upper fence 1.1, the distribution is similar, but there are more bigger (>3) outliers in flexible
name = 'total_contribution_to_sup_waste'
df <- plastic %>% rename(value = total_contribution_to_sup_waste) %>% select(value)
# https://ggplot2.tidyverse.org/reference/geom_dotplot.html
dotplot <- df %>%
ggplot(aes(x = value)) +
# geom_density() +
geom_histogram(binwidth = 0.1) +
# geom_dotplot(method="histodot", stackgroups = TRUE, stackratio = 1.1, dotsize = 1.2, binwidth = 1) +
theme_minimal() +
scale_y_continuous(breaks = NULL)
dotplot <- ggplotly(dotplot) %>% layout(yaxis = list(showticklabels = FALSE, showgrid = FALSE))
boxplot <- df %>%
ggplot(aes(x = 1, y = value)) +
geom_boxplot() +
theme_minimal() +
coord_flip() +
ggtitle(paste("distribution of", name, sep=" ")) +
scale_y_continuous(breaks = NULL)
boxplot <- ggplotly(boxplot) %>% layout(yaxis = list(showticklabels = FALSE, showgrid = FALSE))
# https://ggplot2.tidyverse.org/reference/geom_qq.html
plot_qq <- df %>%
ggplot(aes(sample = value)) +
geom_qq(alpha = 0.5) +
geom_qq_line() +
coord_flip() +
theme_minimal()
plot_qq <- ggplotly(plot_qq) %>% layout(yaxis = list(showticklabels = FALSE, showgrid = FALSE))
# https://plotly.com/r/subplots/
fig <- subplot(dotplot, boxplot, plot_qq, nrows = 3, margin = 0, heights = c(0.5, 0.2, 0.3), shareX = TRUE)
fig
name = c('flexible_format_contribution_to_sup_waste', 'rigid_format_contribution_to_sup_waste')
df <- plastic %>% rename(flexible = flexible_format_contribution_to_sup_waste, rigid = rigid_format_contribution_to_sup_waste) %>% select(flexible, rigid) %>% pivot_longer(cols = c(flexible,rigid))
boxplot <- df %>%
ggplot(aes(x = name, y = value, colour = name)) +
geom_boxplot() +
theme_minimal() +
coord_flip() +
ggtitle(paste("compare ", name[1], "and", name[2], sep=" ")) +
scale_y_continuous(breaks = NULL)
boxplot <- ggplotly(boxplot) %>% layout(yaxis = list(showticklabels = FALSE, showgrid = FALSE))
# https://ggplot2.tidyverse.org/reference/geom_dotplot.html
dotplot <- df %>%
ggplot(aes(x = value, fill = name)) +
# geom_density() +
geom_histogram(binwidth = 0.1, alpha = 0.5, position = "identity") +
# geom_dotplot(method="histodot", stackgroups = TRUE, stackratio = 1, dotsize = 0.23, binwidth = 0.1) +
theme_minimal() +
scale_y_continuous(breaks = NULL)
dotplot <- ggplotly(dotplot) %>% layout(yaxis = list(showticklabels = FALSE, showgrid = FALSE))
# https://ggplot2.tidyverse.org/reference/geom_qq.html
plot_qq <- df %>%
ggplot(aes(sample = value, colour = name)) +
geom_qq(alpha = 0.5) +
geom_qq_line(alpha = 0.5) +
coord_flip() +
theme_minimal()
plot_qq <- ggplotly(plot_qq) %>% layout(yaxis = list(showticklabels = FALSE, showgrid = FALSE))
# https://plotly.com/r/subplots/
fig <- subplot(dotplot, boxplot, plot_qq, nrows = 3, margin = 0, heights = c(0.5, 0.2, 0.3), shareX = TRUE)
fig
| insights from describe multi |
---
title: "summary for plastic waste makers index data"
output: html_notebook
---

---
purpose of notebook
---

  (i) summarize all insights and ideas from the other notebooks, as well as good exploratory plots
  
---
information
---

name: makeovermonday_2021w22
link: https://data.world/makeovermonday/2021w22
title: 2021/W22: The Plastic Waste Makers Index
Data Source: [Minderoo](https://www.minderoo.org/plastic-waste-makers-index/data/indices/producers/)


---
domain information 
---

 (i) Production of single-use plastic (SUP) and contribution to single-use plastic waste is estimated and calculated in million metric tons in 2019.
 (i) Rigid packaging is packaging that features heavier and often stronger materials than flexible packaging. Forms of rigid packaging materials include but are not limited to: glass,      hard plastics, cardboard, metal, and so on. Rigid packaging supplies are usually more expensive than their flexible alternatives and most have significantly higher carbon              footprints than flexible packaging. see https://www.industrialpackaging.com/blog/flexible-vs-rigid-packaging
 (i) Flexible packaging includes all malleable packaging. Some common examples of flexible packaging include shrink film, stretch film, flexible pouches, seal bands, blister or skin        packs, and clamshells. In reality, flexible packaging includes any protective packaging made from materials including plastic, paperboard, paper, foil, wax-coated paperboard, and      similar materials, or combinations of these materials. see https://www.industrialpackaging.com/blog/flexible-vs-rigid-packaging
 (i) In-scope polymersSingle-use plastics can, in theory, be produced from over a dozen polymer families. However, in 2019, we estimate that close to 90 per cent of all single-use          plastics by mass were produced from just five polymers: polypropylene (PP), high-density polyethylene (HDPE), low-density polyethylene (LDPE), linear low-density polyethylene          (LLDPE), and polyethylene terephthalate resin (PET) (Figure M2). see https://cdn.minderoo.org/content/uploads/2021/05/18065501/20210518-Plastic-Waste-Makers-Index.pdf
  
---
summary highlights
---
  


---
stories
---



---
load packages
---
```{r load packages, include=FALSE}
library(tidyverse) # tidy data frame
library(plotly) # make ggplots interactive
```

---
overview
---
```{r}
head(plastic)
```

```{r}
summary(plastic)
```

---
observations from clean nb
---

  (i) columns: rank                                         numeric, ordered, unique, can serve as identifier, rank of producer according to index
               polymer_producer                             string, unique identifier, name of producer
               no_of_assets                                 numeric, metric, number of assets of the producer
               production_of_in_scope_polymers              numeric, metric in million metric tons, production of plolymers that are in-scope of preceding analysis
               flexible_format_contribution_to_sup_waste    numeric, metric in million metric tons, flexible form of contribution to sup waste
               rigid_format_contribution_to_sup_waste       numeric, metric in million metric tons, rigid form of contribution to sup waste
               total_contribution_to_sup_waste              numeric, metric in million metric tons, total contribution is the sum of flexible and rigid
  (i) no missing values at all, also it is a very small dataset
  (i) no duplicated rows
  (i) no changes were made to data set

---
insights from describe uni
---

  (i) no_of_assets is poisson distributed, where most producer only have up to 9 (median = 6) assets, some have up to 29 (upper fence = 26), and only a few (outliers) are above that        with up to 82 assets
  (i) production_of_in_scope_polymers is poisson distributed, likes very similar to no_of_assets, median is 0.9, upper fence is 3.4, max is 11.6
      -> might correlate with no_of_assets?
  (i) flexible_format_contribution_to_sup_waste is poisson distributed, likes very similar to no_of_assets, median is 0.2, upper fence is 1.1, max is 4.7
  (i) rigid_format_contribution_to_sup_waste is poisson distributed, likes very similar to no_of_assets, median is 0.2, upper fence is 1.1, max is 4.5,
      very similar to flexible_format_contribution_to_sup_waste, but with less outliers
  (i) rigid_format_contribution_to_sup_waste is poisson distributed, likes very similar to no_of_assets (again), median is 0.45, upper fence is 1.9, max is 5.9
      is sum of flexible_form + rigid_form
  (i) ration of sup_waste to produced polymers is between min 0.3 and max 1.0 and has median 0.5, most data lies between 0.4 and 0.6, but there is a high spike at 1.0 (with count 15)
  (i) comparing rigid_format and flexible_format shows that up to the upper fence 1.1, the distribution is similar, but there are more bigger (>3) outliers in flexible

```{r}
name = 'total_contribution_to_sup_waste'
df <- plastic %>% rename(value = total_contribution_to_sup_waste) %>% select(value)

# https://ggplot2.tidyverse.org/reference/geom_dotplot.html
dotplot <- df %>%
  ggplot(aes(x = value)) +
    # geom_density() +
    geom_histogram(binwidth = 0.1) +
    # geom_dotplot(method="histodot", stackgroups = TRUE, stackratio = 1.1, dotsize = 1.2, binwidth = 1) +
    theme_minimal() +
    scale_y_continuous(breaks = NULL) 
dotplot <- ggplotly(dotplot) %>% layout(yaxis = list(showticklabels = FALSE, showgrid = FALSE))

boxplot <- df %>%
  ggplot(aes(x = 1, y = value)) +
    geom_boxplot() +
    theme_minimal() +
    coord_flip() +
    ggtitle(paste("distribution of", name, sep=" ")) +
    scale_y_continuous(breaks = NULL) 
boxplot <- ggplotly(boxplot) %>% layout(yaxis = list(showticklabels = FALSE, showgrid = FALSE))

# https://ggplot2.tidyverse.org/reference/geom_qq.html 
plot_qq <- df %>%
  ggplot(aes(sample = value)) +
    geom_qq(alpha = 0.5) +
    geom_qq_line() +
    coord_flip() +
    theme_minimal()
plot_qq <- ggplotly(plot_qq) %>% layout(yaxis = list(showticklabels = FALSE, showgrid = FALSE))

# https://plotly.com/r/subplots/
fig <- subplot(dotplot, boxplot, plot_qq, nrows = 3, margin = 0, heights = c(0.5, 0.2, 0.3), shareX = TRUE) 

fig
```
```{r}
name = c('flexible_format_contribution_to_sup_waste', 'rigid_format_contribution_to_sup_waste')
df <- plastic %>% rename(flexible = flexible_format_contribution_to_sup_waste, rigid = rigid_format_contribution_to_sup_waste) %>% select(flexible, rigid) %>% pivot_longer(cols = c(flexible,rigid))

boxplot <- df %>%
  ggplot(aes(x = name, y = value, colour = name)) +
    geom_boxplot() +
    theme_minimal() +
    coord_flip() +
    ggtitle(paste("compare ", name[1], "and", name[2], sep=" ")) +
    scale_y_continuous(breaks = NULL) 
boxplot <- ggplotly(boxplot) %>% layout(yaxis = list(showticklabels = FALSE, showgrid = FALSE))

# https://ggplot2.tidyverse.org/reference/geom_dotplot.html
dotplot <- df %>%
  ggplot(aes(x = value, fill = name)) +
    # geom_density() +
    geom_histogram(binwidth = 0.1, alpha = 0.5, position = "identity") +
    # geom_dotplot(method="histodot", stackgroups = TRUE, stackratio = 1, dotsize = 0.23, binwidth = 0.1) +
    theme_minimal() +
    scale_y_continuous(breaks = NULL) 
dotplot <- ggplotly(dotplot) %>% layout(yaxis = list(showticklabels = FALSE, showgrid = FALSE))

# https://ggplot2.tidyverse.org/reference/geom_qq.html 
plot_qq <- df %>%
  ggplot(aes(sample = value, colour = name)) +
    geom_qq(alpha = 0.5) +
    geom_qq_line(alpha = 0.5) +
    coord_flip() +
    theme_minimal() 
plot_qq <- ggplotly(plot_qq) %>% layout(yaxis = list(showticklabels = FALSE, showgrid = FALSE))

# https://plotly.com/r/subplots/
fig <- subplot(dotplot, boxplot, plot_qq, nrows = 3, margin = 0, heights = c(0.5, 0.2, 0.3), shareX = TRUE) 

fig
```

---
insights from describe multi
---






